Machine Speech Chain with One-shot Speaker Adaptation

نویسندگان

Andros Tjandra

Sakriani Sakti

Satoshi Nakamura

چکیده

In previous work, we developed a closed-loop speech chain model based on deep learning, in which the architecture enabled the automatic speech recognition (ASR) and text-to-speech synthesis (TTS) components to mutually improve their performance. This was accomplished by the two parts teaching each other using both labeled and unlabeled data. This approach could significantly improve model performance within a single-speaker speech dataset, but only a slight increase could be gained in multi-speaker tasks. Furthermore, the model is still unable to handle unseen speakers. In this paper, we present a new speech chain mechanism by integrating a speaker recognition model inside the loop. We also propose extending the capability of TTS to handle unseen speakers by implementing one-shot speaker adaptation. This enables TTS to mimic voice characteristics from one speaker to another with only a one-shot speaker sample, even from a text without any speaker information. In the speech chain loop mechanism, ASR also benefits from the ability to further learn an arbitrary speakers characteristics from the generated speech waveform, resulting in a significant improvement in the recognition rate.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

متن کامل

Gender Clustering Improvement for Speaker Adaptation Using LDA

Speaker adaptation is an important issue in current speech recognition researches. Speaker clustering is one of the widely used methods in speaker adaptation. In this paper the effect of Linear Discriminant Analysis (LDA) on increasing the accuracy of some clustering methods such as k-means and Support Vector Machine (SVM) is demonstrated. The performance of this idea was examined on AURORA 2.0...

متن کامل

Explorer Unsupervised cross - lingual speaker adaptation for HMM - based speech synthesis

In the EMIME project, we are developing a mobile device that performs personalized speech-to-speech translation such that a user’s spoken input in one language is used to produce spoken output in another language, while continuing to sound like the user’s voice. We integrate two techniques, unsupervised adaptation for HMM-based TTS using a wordbased large-vocabulary continuous speech recognizer...

متن کامل

State-Transition Interpolation and MAP Adaptation for HMM-based Dysarthric Speech Recognition

This paper describes the results of our experiments in building speaker-adaptive recognizers for talkers with spastic dysarthria. We study two modifications – (a) MAP adaptation of speaker-independent systems trained on normal speech and, (b) using a transition probability matrix that is a linear interpolation between fully ergodic and (exclusively) leftto-right structures, for both speaker-dep...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2018

Machine Speech Chain with One-shot Speaker Adaptation

نویسندگان

چکیده

منابع مشابه

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

Gender Clustering Improvement for Speaker Adaptation Using LDA

Explorer Unsupervised cross - lingual speaker adaptation for HMM - based speech synthesis

State-Transition Interpolation and MAP Adaptation for HMM-based Dysarthric Speech Recognition

عنوان ژورنال:

اشتراک گذاری